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Abstract 

Background: Compared to food patterns, nutrient patterns have been rarely used particularly at international level. We studied, 
in the context of a multi-center study with heterogeneous data, the methodological challenges regarding pattern analyses. 

Methodology/Principal Findings: y\fe identified nutrient patterns from food frequency questionnaires (FFQ) in the European 
Prospective Investigation into Cancer and Nutrition (EPIC) Study and used 24-hour dietary recall (24-HDR) data to validate and 
describe the nutrient patterns and their related food sources. Associations between lifestyle factors and the nutrient patterns 
were also examined. Principal component analysis (PCA) was applied on 23 nutrients derived from country-specific FFQ 
combining data from all EPIC centers (N =477,312). Harmonized 24-HDRs available for a representative sample of the EPIC 
populations (N = 34,436) provided accurate mean group estimates of nutrients and foods by quintiles of pattern scores, 
presented graphically. An overall PCA combining all data captured a good proportion of the variance explained in each EPIC 
center. Four nutrient patterns were identified explaining 67% of the total variance: Principle component (PC) 1 was characterized 
by a high contribution of nutrients from plant food sources and a low contribution of nutrients from animal food sources; PC2 by 
a high contribution of micro-nutrients and proteins; PC3 was characterized by polyunsaturated fatty adds and vitamin D; PC4 was 
characterized by calcium, proteins, riboflavin, and phosphorus. The nutrients with high loadings on a particular pattern as derived 
from country-specific FFQ also showed high deviations in their mean EPIC intakes by quintiles of pattern scores when estimated 
from 24-HDR. Center and energy intake explained most of the variability in pattern scores. 

Conclusion/Significance: Ihe use of 24-HDR enabled internal validation and facilitated the interpretation of the nutrient 
patterns derived from FFQs in term of food sources. These outcomes open research opportunities and perspectives of using 
nutrient patterns in future studies particularly at international level. 
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Introduction 

Dietary pattern analyses are a complementary strategy to the 
traditional single-food or nutrient approach for capturing the 
intrinsic complexity of diet, the inter-relationships between its 
difierent components and the heterogeneity in food and nutrient 
patterns existing within and between populations [1,2]. Explor- 
atory dimension reduction methods have been increasingly used to 
derive empirical dietary patterns (using principal components 
analysis or factor analysis) and enabled the identification of dietary 
patterns, e.g. "Western", "Mediterranean" or "Prudent" diet, 
which are potentially associated with different chronic diseases, 
including cancer [2-5]. These multivariate approaches aim to 
summarize a large number of correlated dietary variables (foods, 
food groups, nutrients or biomarkers) into fewer independent 
components explaining most of the dietary variability despite large 
within- and between-subject variations [2,6-8]. 

Compared with food patterns analyses, limited work has been 
done on nutrient pattern analyses to date [9-22]. Although results 
from pattern analyses conducted on foods are easier to translate 
into public health recommendations [23,24], nutrient patterns 
studies have several advantages particularly in an international 
study context. Firsdy, nutrients are to a large extent universal, 
functionally not exchangeable and, in contrast to food patterns, 
may characterize specific nutritional profiles in a more easy way to 
compare populations. Additionally, unlike foods, nutrients show a 
limited number of non-consumers [25]. These specific features 
facilitate the statistical analyses, interpretation and generalization 
of nutrient patterns across populations. Furthermore, the nutrient 
pattern approach could better mirror a combination of bioactive 
nutrients in complex biological mechanisms associated with 
diseases as compared to the use of food patterns [11-21,26]. 
Finally, recent research emphasizes the use of nutritional 
biomarkers and metabolites in epidemiological studies [8,27,28] 
and nutrient patterns act as an interface between food patterns and 
the food metabolome integrating measurements of both diet and 
metabolism [29]. 

Among the studies on nutrient patterns available [1 1- 
18,20,26,30], only one study has been performed at an interna- 
tional level [21]. This may be because of a lack in both 
standardized dietary methods and nutrient databases, and due to 
specific methodological issues in collecting, analyzing and inter- 
preting dietary data and its association with disease [21,31]. 

The aim of this study was to identify nutrient patterns in one of 
the largest cohort studies on diet and cancer and other non- 



communicable diseases, the European Prospective Investigation 
into Cancer and Nutrition cohort (EPIC), combining food 
frequency questionnaire (FFQ) data from 10 countries. In 
addition, we used 24-hour dietary recall (24-HDR) data for 
internal validation of the identified nutrient patterns using Food 
Frequency Questionnaires (FFQ), to interpret them and illustrate 
their related food-sources across countries. Associations between 
socio-demographic and lifestyle factors with these nutrient patterns 
were also examined. 

Methods 

Study Population 

The EPIC study is a multi-center prospective cohort study 
designed to investigate the associations between diet, cancer and 
other chronic diseases across 10 European countries: Denmark, 
France, Germany, Greece, Italy, the Netherlands, Norway, Spain, 
Sweden, and the United Kingdom [32,33]. Participants were 
recruited between 1992 and 1998, and include 521,330 healthy 
men and women aged 35-70 years from 23 administrative EPIC 
centers according to different geographical areas, regions and 
towns. Exceptions were for France (health insurance members), 
Utrecht (The Netherlands) and Florence (Italy) (participants of 
Breast Cancer screening programmes), Oxford (United Kingdom) 
(mostly vegetarian volunteers), and some centers in Spain and Italy 
(mostly blood donors). The French, Naples (Italy) and Norwegian 
cohorts were composed only of female participants. Comprehen- 
sive details of the methods of recruitment and study design have 
been published elsewhere [31,33,34]. 

Measurement of Diet, Lifestyle Factors, Education and 
Height and Weight 

Usual diet was assessed for each individual at recruitment using 
country-specific and validated dietary questionnaires [31]. Differ- 
ent types of validated country-specific questionnaires were used to 
capture country-specific food habits: (1) self- administered quantitative 
dietary questionnaires in Northern Italy, The Netherlands, Germany 
and Greece; (2) semi-quantitative food-frequency questionnaires (FFQs) 
(with the same standard portion(s) assigned to all subjects) were 
used in Denmark, Norway, Naples in Italy and Umea in Sweden, 
United Kingdom; and (3) Combined dietary methods were used in 
Malmo (Sweden), combining a short non-quantitative food- 
frequency questionnaire with a 14-day record on hot meals 
(lunches and dinners). We refer to these questionnaires as baseline 
country-specific FFQs. 
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In addition, a single 24-HDR was collected between 1995 and 
2000 using EPIC-Soft (lARC, Lyon, France) specially designed to 
standardize the recall interviews [35]. The 24-HDRs are used as 
reference measurements and were collected from a stratified 
sample of 36,900 EPIC participants -the Calibration Study- a 
random sample of 5-12% (United Kingdom 1.5%) obtained from 
each of the EPIC cohorts [35,36]. The 24-HDR are used as 
reference measurements and provide accurate mean estimates of 
nutrient and foods at the population level [37]. More details on the 
rationale and characteristics of the calibration study are given 
elsewhere [34,36-38]. The 24TIDRs m:n: collected l)y trained 
personnel in a face-to-face interview, except in Norway where it 
was collected by telephone. Food portion sizes were estimated with 
a common picture book and other methods including standard 
units and household measures. The interviews were distributed 
over season and days of the week [36]. All foods were classified 
according to the common EPIC-Soft food classification as 
described elsewhere [38]. 

Individual intakes of 23 nutrients, water, alcohol and total 
energy were estimated from the baseline country-specific FFQs 
and the 24-HDRs data using a common food composition 
database standardised across the countries involved in EPIC 
(EPIC Nutrient Database, ENDB), recently enriched with folate 
data [39,40]. Supplement use were not included in the calculation 
of nutrient intakes. 

Information on physical activity, history of tobacco smoking, 
alcohol consumption, and education was collected at baseline by 
questionnaires. Weight and height were self-reported in most 
centers by the participants during the 24-HDR interview [36]. 

Exclusion Criteria 

Among the 521,330 EPIC participants, 6,902 subjects were 
excluded from the pattern analysis because they had missing 
baseline dietary* questionnaires. To prevent inclusion of extreme 
values, 10,241 subjects w(;r(; (;xclud(;d because they were in the 
lowest and highest first percent of the distribution of the ratio of 
reported total energy intake to energy requirement. Additionally 
22,432 participants were excluded because they had a prevalent 
cancer at any site at baseline other than non-melanoma skin 
cancer or were lost during the follow-up as well as 4,443 
participants with missing information on lifestyle factors. These 
exclusions are made to be consistent with those applied in EPIC 
diet-disease association studies. Statistical pattern analyses were 
conducted on 477,312 participants, including 34,436 participants 
from the Calibration Study with 24-HDRs. 

Statistical Analysis 

Nutrient pattern analyses were performed using Principal 
Component Analysis (PCA) [41] based on the combined, but 
countr^'-specific FFQ^ derived intake of 23 nutrients. We refer to 
this as an 'overall PCA'. Total fat was divided into monounsat- 
urated, polyunsaturated, saturated fatty acids and cholesterol, 
whilst total available carbohydrates were divided into starch and 
sugars (monosaccharides and disaccharides). Alcohol consumption 
was considered as a main lifestyle factor and was not included in 
the initial list of variables to derive nutrient patterns as reported 
elsewhere [15,42,43]. Besides, when alcohol was included in the 
analysis, alcohol was the only variable that contributed to the first 
pattern defined and was found to be only weakly dependent on 
other nutrients (Pearson correlation coefficients (log scale) of 
alcohol ranged from |r| =-0.13 with sugar to |r| =0.03 with 
magnesium; all correlations were statistically significant). 

Variables were log transformed (natural log) after comparing 
various analysis options with regard to proportion of variance 



captured. Log transformation also renders the variances and 
covariances independent of scale. PCA was used with the 
covariance matrix, rather than the correlation matrix. While the 

correlation matrix is often used in the epidemiology literature, this 
is not strictly PCA [44] and the justification of bringing all 
measures on the same scale is irrelevant after log- transformation. 

In order to capture variability of nutrient intakes independentiy 
from variation in energy intake, nutrients (log variables) were 
adjusted for alcohol-free energy before applying PCA using the 
nutrient density method [45]. We did not adjust for 'Center/ 
country' because our objective was to ascertain patterns across 
Europe rather than within study centers. PCA were conducted on 
both sexes combined and separated. As comparable patterns were 
observed in both sexes in PCA without alcohol included, the final 
results are presented for both sexes combined. The number of the 
retained principal components (PC) or "patterns" was determined 
taking into account the interpretation of the patterns, the 
percentage of total variance explained and the visual inflections 
in the scree-plots of eigen-values [41]. The loadings represent 
covariance between the nutrients and the patterns. Nutrients with 
positive loadings were positively associated with a nutrient pattern 
while negative loadings are inversely associated. Individual PC 
scores were then computed fi-om each retained pattern as the sum 
of products of the observed variables (nutrient intakes [g/day] 
multiplied by weights proportional to the nutrient's loading on the 
pattern [41]. The scores had means of 0 but are not standardized 
to unit variance to keep their original variances (corresponding to 
their eigenvalues). 

Comparison between centers. Separate PCA were carried 
out on the same variables by country and center and the results 
were compared to the overall PCA. We aimed to calculate the 
proportion of variance captured by k center-specific PCs which is 
also captured by the PCs from the overall PCA (Bj^), in other words 
how much the center-specific and the overall PCA agreed. 
Krzanowski's method was used [46], which is based on the 
comparison of eigenspaces : 

Let Ui,...,Uk and vi,...,Vk be the PCs resulting from two distinct 
PCA and uj.Uj = vj.vj = 0 for all iv^j, and Ui.Ui = Vi.Vi= 1; o(f,...,a| 
are the eigenvalues (variances) corresponding to the PCs u. The Bj^ 
measure the proportion of variance in the u-frame which is 
retained when changing to the v-frame. 

The overall PCA, ( ()ml)ining data from all EPIC centers, allows 
capturing a good proportion of the variance explained in each 
EPIC center (Figure 1). Note that since the first two eigenvalues 
are relatively close, it can occur that the order of the first two PC's 
can change between centers. Hence agreement in the first PC was 
low in some centers (Bi<0:10 for 4 centers), but good when at 
least the first two components were combined. More than 75% of 
the variance that would be captured by center-specific PCs was 
captured by the PCs from the overall PCA (Bj>0:76 for all j>2, 
B2>0:85 for 23 of 27 centers). Retaining 4 or more PCs was 
sufficient to capture at least 80% of variance in any center (Bj> 
0:80 for all js4). We conducted similar analyses to study sex 
differences and the difference between genders in each center was 
quite small provided k>2 (Figure not shown). With 23 centers 
from 10 countries, EPIC accounts for a wide heterogeneity in diet 
[25,47]. 
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Description of nutrient patterns combining FFQ^ and 24- 
HDR measurements. This analysis was performed on the 
34,436 participants in the Calibration Study. We classified the 

participants into 5 categories based on the quintiles of each PC 
score. The 24-HDR mean intake for the ith nutrient, food or food 
group, m(t), was calculated for participants in each quintUe of the 
PC scores. A generalized linear regression model was used to 
estimate means adjusted for age, sex, height, weight, country/ 
center and total energy intake to correct for physiological 
differences of the participants across the EPIC centers /countries. 
Models were weighted for seasons and days of the week of recall to 
control for differences in sampling procedures of the 24-HDR 
interviews [36]. Overall "EPIC mean" intake, M(i), was also 
calculated for the same nutrient, food or food groups, as the mean 
in the Calibration Study. To express differences between mean 
intakes of the participants in each quintile category of PC scores 
and the overall EPIC mean, the deviation of the nutrient or food 
intake relative to the EPIC mean was calculated for each nutrient/ 
food, as: 100%* [m(i)/M(i)]. 

A multi-dimensional "radar" graphic presentation of the 
relative nutrient and food intakes was used to illustrate contrasts 
in nutrient, food or food group intakes by quintiles of PC scores. 
EPIC means, used as the common denominator to calculate 
deviations, are indicated in each figure by a reference circle at 
100% and a range of 0-150%. If the relative consumption of a 
nutrient/food is above 100%, it indicates that the given quintile of 
PC score is characterized by a relatively high consumption of that 
nutrient/ food compared with the reference EPIC mean, and vice 
versa when the relative intake is below 100%. The end peaks of 
means exceeding 150% are not reported in the graphs but are 
indicated in Tables S4, S5, S6, S7, S8, S9, SIO, Sll. 

Association of nutrient pattern scores with demographic 
and lifestyle factors. Multiple linear regression models were 
fitted for each of the PC scores on socio-demographic and lifestyle 
characteristics at baseline: sex, age at recruitment (per 10 years, 
continuous), BMI (continuous), log of total energy intake 
(continuous), physical activity (by category: inactive, moderately 
inactive, moderately active, active, unknown), smoking status (by 
category: never, past, current smoker, unknown) educational level 
(by categor)': none, primary school completed, technical/profes- 
sional school completed, secondary school completed, longer 
education including university degree, not specified) and country/ 
center. The EPIC centers within a country were aggregated at 
country level to reflect geographical regions that are presumed to 
share common diets. In contrast, the UK participants were divided 
into two "general population" (Cambridge and Oxford center) 
and "health-conscious" (Oxford center, cohort of vegans and ovo- 
lacto vegetarians) participants [48]. In all models, Spain was 
chosen as the reference country as its dietary habits depict features 
of both northern and southern European patterns. We present the 
regression coefficients and their standard errors. Statistical 
significance was defined using a 2-sided P-value<0.05. Partial 
R were calculated to express the proportion of variance of PC 
scores explained by each of the measured lifestyle variables given 
the other independent variables in the model. For this analysis, PC 
scores were standardized to have a variance of 1 . AH analyses were 
performed using SAS software 9.3. 

Results 

Identification of the Nutrient Patterns (PC) 

Four nutrient patterns (or PC) were retained by the overall PCA 
(N = 47 7,3 12 participants) and explained about 67% of the total 
variance (total nutrient variability) (Table 1). Eigenvectors and 



eigenvalues are presented in Table SI, available online. The 1^' 
PC identified had the largest negative loadings on saturated fatty 
acids, cholesterol, vitamin B12, retinol, and vitamin D (all nutrients 
of animal origin) and positive loadings for dietary- fibre, vitamin C, 
beta-carotene and folate (nutrients from plant sources, except for 
folate which has a dominant plant but also animal origin). This 
pattern accounted for 29% of variance in nutrient intakes. 

The 2°'' PC had the greatest positive loadings on vitamin B 
complex (specifically ri1)oflavin, B(j, folate, B12), vitamin C, beta- 
carotene, retinol, phosphorus, potassium and magnesium and 
negative loading on starch. This pattern accounted for 22% of the 
variance. 

The 3"''' PC accounted for 9% of the variance. Vitamin D had 
the greatest loading of 0.7. Other nutrients contributing to a lesser 
extent included PUFA, thiamin. Vitamin Br and fibre with 
positive loadings and SFA and retinol with negative loadings. 

The 4* and last PC retained accounted for 7% of the variance 
and had the greatest positive loadings on calcium, total proteins, 
riboflavin, and phosphorus and negative loadings on PUFA and 
Vitamin E. 

Description of the Identified Nutrient Patterns Based on 
24-HDR Data 

Figure 2, 3, 4, 5 show graphically the deviations of the adjusted 
24-HDR mean intake of nutrients and foods/food groups by 
different quintiles of PC scores relative to their respective nutrient 
and foods/food group overall EPIC mean intake. Corresponding 
numbers and their deviations are presented online in Tables S4, 
S5, S6, S7, S8, S9, SIO, Sll. The nutrients with high loadings on a 
particular pattern (table 1) also showed high deviations in their 
mean intakes from the overall EPIC means by quintiles of pattern 
scores as estimated from standardized 24-HDR. 

PCI. In comparison with the overall EPIC mean, participants 
in the 1 quintile of PC 1 score were characterized by high intakes 
of SFA, cholesterol, vitamin B12, vitamin D and retinol in contrast 
to low intakes of dietary fibre, vitamin C and beta carotene. When 
compared to the EPIC means, participants in the 5* quintile of 
score reported opposite associations (Figure 2; table S4). When 
considering their related food contributions, animal based foods 
dominated in the T' quintile including meat, processed meat, 
butter, eggs and also coffee (Figure 2; Table S8). Mean intakes of 
plant foods in the quintile were lower than the EPIC means. In 
contrast, participants in the 5*^ quintile were characterized by a 
diet richer in plant foods (fruits, vegetables, fruit juices, soya 
products, vegetable oils and tea) and lower in animal food intakes, 
in comparison with the overall EPIC mean. 

PC2. In the T' quintile of PC2 score, intakes of vitamins B5, 
B12, Folate, riboflavin, vitamin C, beta-carotene, retinol, phos- 
phorus, potassium and magnesium were relatively low in 
comparison with the overall EPIC mean, whereas they were high 
in the 5* quintile (Figure 3; Table S5). Participants in the 5* 
quintile of score have a diet rich in fruits, vegetables, fresh meat, 
eggs, fish and tea, but low consumption of soft drinks, cakes, sugar 
and butter relative to the EPIC means (Figure 3; Table S9). 

PCS. A high mean intake of Vitamin D and PUFA was 
observed in the 5* quintile of score, higher than the EPIC mean 
by respectively 24'% and 5%, while in the T' quintile the mean 
intake was respectively 16% and 8% below the EPIC mean 
(Figure 4; Table S6). Regarding the food consumption, partici- 
pants in the 5**" quintile of score had a diet with a higher 
consumption of fish and soya products but also oils, fruits and 
vegetables and cereals in comparison with the EPIC means. Fish 
and soy product intakes in this quintile were respectively 24% and 
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Figure 1. Proportion of the variance in each EPIC center captured in an overall PCA on combined data by the number of PC 
retained. 
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35% higher than the EPIC mean, while respectively 21% and 
16% lower in the T' quintUe (Figure 4; Table SIO). 

PC4. In comparison with the EPIC means, this pattern was 
characterized by high intakes of PUFA, beta carotene, retrnol and 
vitamin E in the quintHe, with corresponding low intakes in the 
5* quintUe. Calcium, Vitamin B12, Riboflavin, phosphorus, 
potassium and total protein intakes were much lower in the T' 
quintUe and higher than the EPIC mean by up to 12% in the 5* 
quintUe. In terms of foods, dairy product consumption, especially 
mUk, increases from the 1*' to 5''^ quintUe, while soy products had 
high consumption in the T' quintUe. Besides, intake of fish was 
relatively high in the 5* quintHe (Figwe 5; Table S7 and Sll). 

Demographic and Lifestyle Factors Associated with the 
Identified Nutrient Patterns 

Tables 2 and 3 show the regression coefficients and partial R- 
squared of individual PC scores for each of the four patterns 
retained for demographic and lifestyle factors, country of 
recruitment and energy intake. Corresponding mean values of 
baseline factors by PC quintUe are presented in Table S2. Country 
and total energy intake were the most important measured 
predictors for the four retained PC scores (Table 3). Country 
accounted for more than 12% of the variability of each PC, with 
the least contribution to PC4 (12%) and the greatest to PCI (24%). 
Distribution of participants by country and quintUes of pattern 
scores are presented in Table S3. Variability attributable to total 
energy ranged from 1% (PC3) to 4% (PCI). 

Study participants with high scores on PC 1 were more likely to 
be female, had a higher education, were more often former 
smokers and less frequently current smokers, had a higher level of 



physical activity, were older, had lower energy intake, and a lower 
BMI than participants with lower scores. Participants living in 
Greece and the UK health conscious had higher overall scores as 
compared to Spain (referent category). The remaining countries 
had lower scores (Table 2). 

Participants with high scores on PC2 were more likely to be 
female, former smokers, more well educated and physically active 
and with lower total energy intake. As compared to Spain, 
participants from the rest of the countries in the cohort had higher 
scores with the exception of participants from Italy. The socio- 
demographic characteristics of individuals in PC 1 and PC2 with 
higher scores were relatively similar. PC 3 score was positively 
associated with age, BMI and former smoking and was inversely 
associated with female sex, lower education, lower levels of 
physical activity, current smoking and total energy intake. The 
Nordic countries (Norway and Sweden) had the highest scores 
followed by Spain. PC4 score was positively associated with age, 
BMI, higher education, smoking (both smokers and former 
smokers) but negatively associated with female sex and total 
energy intake. As compared to Spain, all the other countries had 
lower scores (Table 2). 

Discussion 

We identified four nutrient patterns using PCA across the 23 
European centers participating in the EPIC study. We showed the 
applicability of an overall PCA combining all data since nutrient 
patterns revealed themselves to be reproducible across EPIC 
centers. We then used the standardized 24-HDRs collected in a 
representative sub-sample of the EPIC study to describe these 
patterns and depict their related food sources. The use of 24-HDR 
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Table 1. Loading matrix and explained variances for the first 
four Principal Components (PC) identified by PCA*. 





Nutrient variables 


PCI 


PC2 


PC3 


PC4 


Total proteins 


-0.10 


0.41 


0.08 


0.55 


Saturated Fatty Acids (SFA) 


-0.48 


0.05 


-0.32 


-0.18 


Monounsaturated Fatty Acids (MUFA) 


-0.06 


-0.12 


-0.24 


-0.12 


Polyunsaturated Fatty Acids (PUFA) 


0.09 


0.25 


0.26 


-0.37 


Cholesterol 


-0.57 


0.30 


-0.17 


0.25 


Starch 


-0.05 


-0.35 


0.22 


-0.15 


Sugars 


0.30 


0.14 


0.02 


0.15 


Dietary fibre 


0.57 


0.33 


0.26 


-0.04 


Thiamin 


0.32 


0.43 


0.32 


0.22 


Riboflavin 


0.06 


0.60 


-0.12 


0.51 


Vitamin Be 


0.37 


0.51 


0.25 


0.36 


Folate {Vitamin B9) 


0.59 


0.59 


0.03 


0.16 


Vitamin 812 


-0.57 


0.54 


-0.20 


0.39 


Vitamin C 


0.66 


0.42 


-0.02 


0.11 


Beta-carotene (P-carotene) 


0.60 


0.66 


-0.12 


-0.27 


Retinol 


-0.73 


0.48 


-0.26 


-0.26 


Vitamin E 


0.41 


0.28 


0.10 


-0.35 


Vitamin D 


-0.55 


0.41 


0.70 


-0.06 


Calcium 


0.14 


0.35 


-0.16 


0.45 


Phosphorus 


0.11 


0.49 


0.06 


0.48 


Iron 


0.34 


0.34 


0.00 


0.17 


Potassium 


0.42 


0.59 


0.21 


0.36 


Magnesium 


0.30 


0.47 


0.15 


0.23 


Proportion of explained variance (%} 


29.2 


21.8 


9.0 


7.3 


Cumulative explained variance (%) 


29.2 


51.0 


60.0 


67.3 



*Estimates from a EPIC-Wide PCA done on the country-specific FFQ derived 
intake levels of 23 nutrients {log-transformed and energy adjusted using the 
energy density method, using Alcohol-free Energy}. 
doi:1 0.1 371 /journal.pone.0098647.t001 



allowed internal validation of the patterns obtained using the FFQ 
data: the 24-HDRs provide good mean estimates at the population 
level in a comparable way across countries [48] . Our analysis was 
therefore focused on the comparison of mean dietary intakes 
within each quintHe of pattern scores. Additionally, we investigat- 
ed the relationship between the nutrient patterns and socio- 
demographic and lifestyle characteristics of the participants. 

For this nutrient pattern analysis, we benefit from the unique 
features of the EPIC cohort, involving a European study 
population with a large geographical spread and high heteroge- 
neity in dietary intakes and patterns [31]. The EPIC study offered 
the ideal setting to address a series of methodological challenges 
such as normalisation, transformation and scaling of variables, 
energy adjustment, how to deal with heterogeneous data between 
centers and sexes to implement dimension reduction methods such 
as PCA. The EPIC study also offered the opportunity to use two 
complementary dietary assessment methods (FFQ^ and 24-HDRs) 
to identify and describe the patterns. The internal approach has 
been used in the Framingham Study to describe clusters defined 
on FFQs data with mean intakes of nutrients derived from an 
independent 3-day food record [49]. 

All studies pubhshed so far on nutrient patterns were conducted 
at the national level in different geographic areas and populations, 
except one combining data from 5 case-control studies [21]. These 
previous studies consistently identified a nutrient pattern labeled as 
"meat" [10,19], "high-meat" [13,18], "animal products" 
[9,11,15,16] or "animal products and cereals" [21], which was 
characterized by nutrients from animal food sources. In our study 
we identified a pattern characterized by positive loadings of 
nutrients essentially from plant food sources and negative loadings 
of nutrients that tend to be correlated at the individual level with 
animal food sources. Second, previous studies have also consis- 
tently identified a nutrient pattern labelled as "fiber and vitamins" 
[9,11,15-17,20,22,30] or "vitamins-rich" [14] or "antioxidant 
vitamins and fiber" [21], characterized by a diet rich in vitamins 
and minerals and sharing similar features with our 2""* pattern with 
high loadings on a number of micro-nutrients and proteins. Our 
PC3 has similar features with the "polyunsaturated fatty acids and 
vitamin D" pattern reported elsewhere [22], with high loadings on 
Vitamin D and PUFAs. 




Potatoes & Other tubers 
Vegetables 



Cart>on. Soft drinks Syrups 



Fruit & vegetable juices 



Vitamin C 

Vitamin B12 



Riboflavin 
Vitamin B6 



Folate 

Quintile 3 Quinlile 4 




Sugar& Conrectionary 



Other Dairy Products 
Milk 

Cereals & Cereal products 
^ Fresh Meat 



Butter 
Vegetable oils 



Processed Meat 
Fish & Shellfish 



Esgs 

Quintile 3 Quinlile4 



Figure 2. Deviation (%) of the 24-HDR mean intakes from the overall EPIC means among participants in the quintiles of PC1 scores 
for nutrients (A) and foods (B). Means are adjusted for age, sex, height, w/eight and energy and weighted for day and season of recall (N = 34,436). 
The reference circle of the radius (100%) correspond to the 'EPIC means' and the spikes indicate the deviation of the specific nutrient mean in 
quintiles of pattern scores from the reference 'EPIC means'. 
doi:10.1371/journal.pone.0098647.g002 
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Figure 3. Deviation (%) of the 24-HDR mean intal<es from the overall 'EPIC means' among participants in the quintiles of PC2 scores 
for nutrients (A) and foods (B). Means are adjusted for age, sex, height, weight and energy and weighted for day and season of recall (N = 34,436). 
The reference circle of the radius (100%) correspond to the 'EPIC means' and the spikes indicate the deviation of the specific nutrient mean in 
quintiles of pattern scores from the reference 'EPIC means'. 
doi:10.1371/journal.pone.0098647.g003 



Comparpd to foods, nutrients are to a large extent universal and 
are absorbed, although with some variability, whatever the food 
consumed, and functionally not exchangeable. In contrast to food 
patterns, nutrient patterns may characterize specific nutritional 
profiles in a more easy way to compare populations. This 
approach is particularly useful to identify combinations of 
nutrients that could reflect possible biological mechanisms. Despite 
the heterogeneity in the foods consumed within and between 
individuals and study populations in the EPIC cohort [48], PCI 
and PC 2 were driven by nutrients that can be found in many food 
groups and were therefore independent from the food groups they 
came from. They reflect a broad range of food sources and thus 
the most prevalent types of dietary patterns which explain the 



largest proportion of the variance (51'/o). In contrast, the 3"^ and 
the 4* patterns are more related to specific food sources were 
variation is less pronounced i.e. fish and soy products for PCS 
(high contribution of vitamin D and PUFA) and milk for PC4 
(high contribution of calcium, phosphorus, proteins, riboflavin). 

The first four PCs retained in our analysis explained a high 
proportion of the total variance in the original data (67%), higher 
than those reported in food pattern analysis: the percentage of 
variance explained by the first PC is relatively high when 
compared to that reported in studies of dietary patterns on the 
same data defined using similar methods [50,5 1] . This is probably 
due to the use of nutrients rather than using foods or food groups 
as variables in the multivariate analyses [15]. The percentage of 




Figure 4. Deviation (%) of the 24-HDR mean intakes from the overall 'EPIC means' among participants in the quintiles of PC3 scores 
for nutrients (A) and foods (B). Means are adjusted for age, sex, height, weight and energy and weighted for day and season of recall (N = 34,436). 
The reference circle of the radius (100%) correspond to the 'EPIC means' and the spikes indicate the deviation of the specific nutrient mean in 
quintiles of pattern scores from the reference 'EPIC means'. 
doi:1 0.1 371/journal.pone.0098647.g004 
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Quintile 1 - • -Quimile2 ■ -QiiintileS - -Quintiled » Quinlile 5 EPK^mean Quintilel — . — Quimile2 ■■-■QuimileS - - Quimile4 » Quintile 5 EPICmMn 

Figure 5. Deviation (%) of the 24-HDR mean intal<es from the overall 'EPIC means' among participants in the quintiles of PC4 scores 
for nutrients (A) and foods (B). Means are adjusted for age, sex, height, weight and energy and weighted for day and season of recall (N = 34,436). 
The reference circle of the radius (100%) correspond to the 'EPIC means' and the spikes indicate the deviation of the specific nutrient mean in 
quintiles of pattern scores from the reference 'EPIC means'. 
doi:10.1371/journal.pone.0098647.g005 



explained variance in our study is comparable to that reported in 
other studies on nutrient-based patterns. 

In this study, nutrient intakes were derived from the usual food 
consumption data collected through country-specific FFQs which 
are prone to measurement errors and potentially introduce 
systematic between-country differences in nutrient assessment. 
The number of questions related to consumption of specific foods 
was adapted to local customs in the country-specific FFQs because 
these habits vary between countries [31]. The distribution of 
quintiles of pattern scores by countries or centers (Table S2) 
Ulustratcd hctc-rogeneity in diet across EPIC centers already 
observed and reported before [25,47]. However, harmonized food 
composition tables across European countries were used to 
translate food into nutrient intakes thus sizeably improving the 
comparability of nutrient intakes [39] . 

The use of dietary supplements was not included in the 
calculation of nutrient pattern scores. Previous study has shown 
some heterogeneity regarding the proportion of dietary supple- 
ment users in the EPIC Study, with a high consumption in 
northern countries [52]. In our analysis, we have depicted nutrient 
patterns from natural food sources only without having supple- 
ments included. Given the limited evidence on the protective and 
detrimental effects of food supplements, most of the nutritional 
recommendations and guidelines promote the use of a wide variety 
of foods above the use of food supplements [53]. In a sensitivity 
analysis, we have checked whether dietary supplement use 
(categorical variable: Yes, No, Unknown) contributes to the 
variability of each PC score, but the contribution was negligible 
(data not shown). 

The EPIC centers were identified to be the main factor 
explaining the variability in PC scores (partial R2 analysis-Table 3). 
To capture the variability between the nutritional variables 
independentiy of a center effect on dietary measures, one solution 
would have been to use the consumption of nutrients adjusted for 
the center by subtracting the average center score, but this would 
have restrained the nutrient patterns to intra-center variation only. 
Combined data from all the EPIC centers (without adjustment for 
'center') was preferred as the main objective was this analysis was 



to ascertain and compare patterns across Europe rather than 
within study centers. 

Energy intake was the second most important factor explaining 
variability in PC scores, despite the use of energy density 
normalization [45] prior to applying PCA. Normalization for 
total energy helps to remove variation due to body size and 
metabolic rate [45] and should have contributed to reduce 
measurement errors in reported dietary intakes and increase 
nutrient pattern comparability across countries [34] . This does not 
contradict the possibility that those eating a high energy diet tend 
to eat a different pattern of foods and hence nutrients. 

The use of a PCA approach to define nutrient patterns in this 
project has advantages as compared to Factor analysis (FA). PCs 
are generated sequentially, meaning that the variance explained 
by the first factor is removed and the second factor is then 
generated to maximally explain the remaining variance. The 
definition of each factor is independent of the number of factors 
retained, which is not the case for FA. The PC scores are also 
orthogonal and the patterns are objective (no use of rotations). 
Besides, using PCA, nutrients could load on multiple patterns 
which is not the case with FA. ^Although PCA complicates the 
interpretation of the patterns, this approach is particulartiy usefuU 
in the context of nutrient patterns in order to identify combination 
of nutrients that could reflect possible biological mechanisms. 

Among the limitations related to the PCA approach are 
subjective decisions on how to interpret nutrient patterns. There 
are questions such as the choice of variables to include in the 
analysis, whether to transform and or standardize the data, the 
number of components to retain and finally the threshold for 
factor loadings (i.e. in this analysis |0.45|) [41]. In addition, 
patterns identified do not provide an immediate picture of exactiy 
what is being consumed, as the same scores may be obtained with 
different combinations of nutrients or difierent quantities of foods, 
which may be high or low in nutrient density. This method can be 
influenced by the way in which nutrients are grouped, as this 
may obscure the patterns within subpopulations or artificially 
separate them based on inter correlations of uniquely consumed 
foods [54]. 
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A disadvantage of a nutrient-based approach is that nutrients 
are less direcdy related to dietary recommendations because 
ultimately, nutrient intakes are largely determined by the choice of 
food sources. Since many food sources exist for the same nutrient, 
it is challenging to make food-based dietary recommendations. 
However, our study addressed these challenges. Indeed, the 
integration of standardised 24-HDRs for estimating nutrient 
intakes from a representative sub-sample of our whole study 
population enabled both to validate the nutrient patterns as well as 
to identify their main specific food sources. These results confirm 
the increasing potentials of integrated dietary approaches, 
increasingly recommended in nutritional epidemiological studies 
and stress the need to pursue this stiU under explored research area 
[55]. 

Besides, the use of identified nutrient patterns in examining diet- 
disease relationships has been questioned [56]: PC A aims at 
maximising the fraction of variance explained by a weighted linear 
combination of original variables, but the aspects of nutrition 
which are most variable need not be those that are most strongly 
associated with disease. Indeed it could be argued that the most 
variable aspects of human diet could be those that have least 
bearing on health. Despite these limitations, the promising and 
consistent results obtained from this analysis contribute to new 
knowledge and open new research perspectives. 

Conclusions 

This analysis identified four nutrient patterns and the use of 
two independent and complementary dietary assessment 
tools (FFQ^ and standardized 24-HDR) enabled their internal 
validation and interpretation in a complex international study 
context. It is anticipated that the proposed approach wiU facilitate 
the integration of nutrient patterns into multivariate and multi- 
level analyses of dietary exposure (incl. food, nutrient and 
biological/ omic patterns) and strengthen the understanding of its 
association with diseases. In addition, this should open new 
perspectives in a research domain still under-explored and 
facilitate internationalization of public health recommendations 
through a better understanding and integration of nutrient 
patterns. 
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